Strings and Runes and Bytes
Overview
Golang has two representation type for strings bytes(UTF-8) and runes. For first 128 characters, UTF-8 and ASCII code share the same codes. UTF-8 is not just the decimal representation of the Unicode code point. They just met till a certain character number.(128 characters) Beyond that character they become different.
Caveat Golang uses word byte but it is using (UTF-8 representation)!
UTF-8 byte representations
0xxxxxxx - 1-byte representation 110xxxxx - 2-bytes representation 1110xxxx - 3-bytes representation 11110xxx - 4-bytes representation 10xxxxxx - Continuous byte
How UTF-8 works to show character
UTF-8 bytes ↓ Strip the prefix bits (110, 10, etc) ↓ Combine the remaining data bits ↓ That gives you the Unicode code point ↓ Look up the character
Example
é = U+00E9 = decimal 233
233 in binary = 11101001
Does it fit in byte format? for 1 byte UTF-8
it won't fit.
So use two bytes UTF-8
110xxxxx 10xxxxxx
seprate the 233 in binary into
11 101001 like this
add those binary to UTF-8 2 byte binary format
11000011(11 added) 10101001(101001 added)
So it became like that
convert those UTF-8 2 bytes binary format to decimal
195 169
To get a certain Unicode code point
Starts with 110 → "This is a 2-byte character"
Computer reads byte `C3` = `11000011`
Read next byte too: A9 = 10101001
Strip prefixes
11000011 10101001
^^^ ^^
110 10 ← prefixes, throw away
Remaining data bits: 00011 + 101001 = 00011101001 = 233 decimal = E9 hex
Look up Unicode code point
U+00E9 = é
Runes
So basically, Runes are direct decimal representation of Unicode Code Point. for example
é = U+00E9 = decimal 233
r:= []rune("é")
output of r[0] will be 233
s := "Aé😊"
// String (UTF-8 bytes):
// A = 1 byte (65)
// é = 2 bytes (195, 169)
// 😊 = 4 bytes (240, 159, 152, 138)
len(s) = 7 // total bytes
// Runes (code points):
r := []rune(s)
// r[0] = 65 (A)
// r[1] = 233 (é)
// r[2] = 128522 (😊)
len(r) = 3 // 3 actual characters
Memo
Most data in Go is read and written as a sequence of bytes, so the most common string type conversions are back and forth with a slice of bytes. Slices of runes are uncommon.